Incremental Data Fusion Based on Provenance Information

نویسندگان

  • Carmem S. Hara
  • Cristina Dutra de Aguiar Ciferri
  • Ricardo Rodrigues Ciferri
چکیده

Data fusion is the process of combining multiple representations of the same object, extracted from several external sources, into a single and clean representation. It is usually the last step of an integration process, which is executed after the schema matching and the entity identification steps. More specifically, data fusion aims at solving attribute value conflicts based on user-defined rules. Although there exist several approaches in the literature for fusing data, few of them focus on optimizing the process when new versions of the sources become available. In this paper, we propose a model for incremental data fusion. Our approach is based on storing provenance information in the form of a sequence of operations. These operations reflect the last fusion rules applied on the imported data. By keeping both the original source value and the new fused data in the operations repository, we are able to reliably detect source value updates, and propagate them to the fusion process, which reapplies previously defined rules whenever it is possible. This approach reduces the number of data items affected by source updates and minimizes the amount of user manual intervention in future fusion processes.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Traceable Data Fusion Based on Data Provenance

Data fusion is a hot topic in data integration which at least includes the two stages: entity resolution and data conflict resolution. However, the existing fusion process is transparent and the fusion stages are isolated. So in this paper, we proposed a traceable data fusion mechanism based on data provenance which can trace the data sources of fusion results and the evolutionary process. The ...

متن کامل

Distributed and Cooperative Compressive Sensing Recovery Algorithm for Wireless Sensor Networks with Bi-directional Incremental Topology

Recently, the problem of compressive sensing (CS) has attracted lots of attention in the area of signal processing. So, much of the research in this field is being carried out in this issue. One of the applications where CS could be used is wireless sensor networks (WSNs). The structure of WSNs consists of many low power wireless sensors. This requires that any improved algorithm for this appli...

متن کامل

Query capabilities of the Karma provenance framework

Provenance metadata in e-Science captures the derivation history of data products generated from scientific workflows. Provenance forms a glue linking workflow execution with associated data products, and finds use in determining the quality of derived data, tracking resource usage, and for verifying and validating scientific experiments. In this article, we discuss the scope of provenance coll...

متن کامل

Update Exchange with Mappings and Provenance

We consider systems for data sharing among heterogeneous peers related by a network of schema mappings. Each peer has a locally controlled and edited database instance, but wants to ask queries over related data from other peers as well. To achieve this, every peer’s updates propagate along the mappings to the other peers. However, this update exchange is filtered by trust conditions — expressi...

متن کامل

Selective and incremental fusion for fuzzy and uncertain data based on probabilistic graphical model

Active and dynamic fusion for fuzzy and uncertain data have key challenges such as high complexity and difficult to guarantee accuracy, etc. In order to resolve the challenging issues, in this article a selective and incremental data fusion approach based on probabilistic graphical model is proposed. General Bayesian networks are adopted to represent the relationship among the data and fusion r...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013